home *** CD-ROM | disk | FTP | other *** search
- ADDPAGE
- Dimitri Vulis
- Department of Mathematics
- CUNY Graduate Center
- 33 W 42 St
- New York, NY 10036-8099
- USA
-
- dlv@cunyvms1.bitnet
-
- June 22, 1989
-
- 1. General overview:
-
- MS/PC DOS 3.3 (and later) is distributed with the following code pages:
-
- 437 - USA
- 865 - Norway
- 860 - Portuguese
- 863 - Canada-French
- 850 - Multilingual
-
- This package will allow you to patch your copy of DOS to add more code pages. A
- Cyrillic code page, arbitrarily numbered 880, corresponding to the standards ISO
- 8859 part 5, ECMA 113 and GOST 19768-74 is provided with the package. There is
- also an INCOMPLETE Greek code page, arbitrarily numbered 890, corresponding to
- the standards ECMA 118 and ISO 8859-3 (and hence the appropriate ELOT standard)
- (see below).
-
- Hardware requirements: you have to have EGA, VGA or MCGA to use code pages. DOS
- code pages don't work with CGA or Hercules.
-
- Once the Cyrillic code page is loaded and selected, you will be able to
- correctly display on your screen documents that contain Cyrillic text coded in
- accordance with the above standards, in particular GOST-coded (Soviet) text.
-
- In addition, a TSR program called KEYBRU will redefine your keyboard into a
- standard Russian layout when ScrollLock is pressed. This program will allow you
- to type in Russian text. Unfortunately, there is no easy way to patch DOS
- (KEYBOARD.SYS) to add an additional keyboard layout.
-
- In order to print out Cyrillic documents you need standard font(s) for your
- particular printer. A downloadable font for Epson FX (9-pin) is included. One of
- the updates for this program will include downloadable fonts for HP LJ. A
- technique for printing out Cyrillic documents using TeX and American Math
- Society's Cyrillic fonts is provided.
-
- A Russian version of TeX, Donald Knuth's typesetting system, is currently being
- developed at various sites. A set of Russian hyphenation patterns is included.
-
- 2. The code pages.
-
- ISO 8859-3/ECMA-113 includes the characters necessary to handle Russian,
- Bulgarian, Byelorussian, Macedonian, Serbocroatian and Ukrainian text.
-
- The positions of the Russian letters coincide with GOST. The positions of
- Serbian letters in ISO 8859-5 do NOT coincide with the Jugoslavian standard JUS
- I.B1.003, ISO registration 146/147.
-
- Several additional rasters that may be useful for Russian text (obsolete
- letters and guillemets) have been placed in the unused positions. Depending on
- your applications, you may want to replace some Serbian characters with them.
-
- I made an effort to make Russian letters shapes distinct from usually identical
- Latin letters, e.g. Russian ░ and Latin A. The shapes generally resemble those
- produced by Soviet DP equipment. If you improve the shape of any of the
- letters in 880.ASM, please let me know.
-
- The Greek code page includes the characters necessary to handle Modern Greek
- text. The only accents are diaeresis and tonos. It does nor include asper,
- lenis, acute, grave, circumflex accents and their combinations that are
- necessary for Classical Greek.
-
- I lack the expertise to draw the remaining Greek rasters. Although many people
- have expressed interest in a Greek code page, no one has been willing to write
- the rasters. I am distributing 890.ASM with most characters missing, hoping
- that someone will volunteer to complete it.
-
- 3. The keyboard driver.
-
- Only the Russian keyboard driver is provided. Scroll Lock switches between
- Russian and Latin keyboards. The layout is:
-
- ≡ ² / " : , . ; ? ò < > (Alt-1, etc)
- 1 2 3 4 5 6 7 8 9 0 - =
-
- ┘ µ π ┌ ╒ ▌ ╙ Φ Θ ╫ σ Ω
- q w e r t y u i o p [ ]
-
- Σ δ ╥ ╨ ▀ α ▐ █ ╘ ╓ φ
- a s d f g h j k l ; '
-
- ∩ τ ß ▄ ╪ Γ ∞ ╤ ε ±
- z x c v b n m , . /
-
- The funny symbols should be displayed at Russian letters if you've installed
- the code page properly.
-
- Most newer Soviet keyboards have the /? key in the same place as US keyboards
- do. I put the ± character here, like on a typewriter, since it is used by
- students of Russian. If you don't need this character, you may want to edit
- KEYBRU.ASM and comment out the 2 relevant lines. Soviet keyboards have 2 extra
- keys (as do most European keyboards): <> and *. For this reason, in Russian
- mode some punctuation marks can only be obtained using Alt- and a key in the
- upper row.
-
- Like any TSR (terminate and stay resident) program, KEYBRU may interfere with
- the normal operation of your computer. Whenever this happens, try changing the
- order in which your TSRs are loaded.
-
- Caps Lock does not work properly for several keys. Also Scroll Lock is examined
- when the character is dequeued, not when it is enqueued, which changes nothing,
- unless you type ahead while switching between Russian and Latin modes. These
- problems would not occur if I could include the keyboard in KEYBOARD.SYS.
-
- Later versions of this package may include standard Serbian, Ukrainian, etc,
- keyboard layouts as well. Please let me know if you know what these layouts
- are.
-
- The standard Greek keyboard layout includes dead keys; one has to really
- include it in KEYBOARD.SYS.
-
- 4. Installation.
-
- Notation: DOS directory is usually the root. Work directory may be a floppy
- disk (A:). Make sure the following files are available:
-
- EGA.CPI in the DOS directory (from your DOS distribution disk).
- DISPLAY.SYS in the DOS directory (from your DOS distribution disk).
-
- MODE.COM on your PATH (from your DOS distribution disk).
- KEYBRU.COM on your PATH (from this archive).
-
- 880.CP and ADDPAGE.BAS in the work directory (from this archive).
-
- Run the basic program ADDPAGE as follows:
-
- ------------------->Type:
-
- C> BASIC A:ADDPAGE
- CPI filename: \DOS\EGA.CPI (or just \EGA.CPI etc)
- Target codepage: 880
- In or Out? I
- CP filename: A:880.CP (or whereever)
- Code page not in file---replacing...
- <long wait>
- Ok
- SYSTEM
- -----
-
- Erase 880.CP and ADDPAGE.BAS from your hard disk, you don't need them anymore.
-
- Use your favorite editor to add the following to your CONFIG.SYS:
-
- DEVICE=C:\DOS\DISPLAY.SYS CON:=(EGA,437,(1,3))
-
- If you are using EGA, not VGA, change the last 3 to 2.
-
- (You have to say 'EGA' even if you're really using a VGA. 1 is the number of
- code pages you are going to load; increase this number if you want to load
- other pages. 3 is the number of font variants; 3 is for VGA (16, 14 and 8
- pixels high); 2 is for EGA (only 14 and 8). See your DOS manual if you need
- more info.)
-
- Use your favorite editor to add the following to your AUTOEXEC.BAT:
-
- MODE CON CP PREPARE=((880) \DOS\EGA.CPI)
- MODE CON CP SELECT=880
- KEYBRU
-
- Reboot.
-
- You can switch the code pages anytime using
- MODE CON CP SELECT=437
- for U.S. page and
- MODE CON CP SELECT=880
- for Cyrillic page. CHCP won't work.
-
- 5. Printing
-
- To print Russian text on a 9-pin Epson FX printer, first send the downloadable
- font in EPSON9.FNT to the printer: COPY/B EPSON9.FNT PRN. If you print from
- within a word processor, make sure it does not reset the printer and delete the
- fonts before you print.
-
- The enclosed program TRR.C will translate Russian characters in STDIN into calls
- to AMS Cyrillic fonts for TeX in STDOUT. It is assumed that \mcyr is defined as
- a font or font family. (Just say \font\mcyr=mcyr10, if you're not sure.) This
- has been tested with both Plain TeX and LaTeX. Unfortunately, this approach is
- not compatible with TeX's hyphenation algorithm.
-
- 6. Hyphenation
-
- The file RUSSHYPH.PAT contains an improved version of the patterns presented in
- my M.A. thesis, "An Implementation of Liang's Algorithm for the Russian
- language", submitted to CCNY in October of 1988. The patterns find all the valid
- and no invalid hyphens in a 50,000+ word dictionary (including inflections). The
- main improvements, compared with the thesis, are:
-
- a) I fixed a few incorrect hyphenations, e.g., ▀▐-▄▌╪Γ∞, from ▀▐▄-▌╪Γ∞, etc. I
- also keyed in the balance of the dictionary, and added a few words supplied by
- A. Samarin (see below).
-
- b) All the technical terms that I could find that are borrowed from German,
- Dutch, English, etc, are hyphenated correctly.
-
- c) The patterns won't split a single vowel off a part of many compound words.
- Thus, ▀α∩▄▐π-╙▐█∞▌╪┌, ╤╪-▐█▐╙, ▌╒▐-╤δτ▌δ┘, ▌╒-▐Σ╨Φ╪╫▄, etc, are now suppressed.
- (Such breaks are not strictly illegal, but certainly offensive, and a good break
- is just 1 letter away.)
-
- d) The patterns will handle many common abbreviations, such as ╤▐αΓ-▀α▐╥▐╘▌╪µ╨,
- ▀╨αΓ-πτ╒╤╨, ┌▐▄ß-▐α╙, etc. (Most such words are not considered to be part of the
- language, but occur often in certain kinds of texts.)
-
- Although (c) and (d) sound like neat tricks, such words usually fit one of the
- common patterns. I used a slightly modified PATGEN, running on a Kouwei computer
- from Barry Hu, Microstar, to generate the patterns. PATGEN is an extremely
- powerful tool, and I would never have generated such good patterns without it.
- Like PATGEN says,
-
- 127685 good, 0 bad, 0 missed
-
- In April of 1989 I was informed by Alexander Samarin of the Institute for High
- Energy Physics in Serpukhov, USSR, currently at CEARN, that an algorithm for
- automatic hyphenation of Russian words was developed at IHEP and a preprint was
- published in 1983. I was unable to get the paper or the algorithm. Alexander
- Samarin kindly sent me a file with about 21,000 inflected Russian words, for
- which I am very grateful.
-
- These patterns are 'final', in a sense that I don't expect to change or improve
- them in the future. I am not aware of any Russian words that are not hyphenated
- correctly by the patterns. It is possible to manufacture abbreviations (4) that
- won't be broken up completely, although invalid breaks are unlikely. It is very
- hard to find a compound word (3) where a single vowel might be split off. Of
- course, if you use the patterns and find any word that's not fully and correctly
- hyphenated, please let me know.
-
- The patterns, meant as input to Liang's algorithm, consist of strings of
- letters and digits, where a digit placed between two letters indicates a
- `hyphenation value' for its position. Odd values permit breaks; even values
- (including zero, assumed when the digit is omitted) prohibit breaks. The text
- processing program finds all the patterns whose letters match part of the word,
- takes the maximum hyphenation value for each position between letters and examines
- its parity to exhibit the legal breaks.
-
- I have been told that both Microsoft Word and WordPerfect use Liang's algorithm
- for hyphenation, but have English patterns hardwired.
-
- 7. Possible problems
-
- Q: The installation is too complex.
-
- A: I cannot give out patched out EGA.CPI because it's copyrighted. Ask someone
- to help.
-
- Q: BASIC won't run on this machine.
-
- A: Do the installation on another machine that has BASIC, and then copy
- EGA.CPI.
-
- Q: KEYBRU conflicts with other TSRs.
-
- A: Sigh. Try changing the order in which the TRSs are loaded. A better solution
- would be to add the Russian keyboard layout to KEYBOARD.SYS and to use a
- vanilla DOS KEYB command; alas, I was unable to do it.
-
- Q: Why does WordPerfect misinterpret some of the Russian letters?
-
- A: I don't know. Typing lowercase p α (224) causes WP to look for a file
- "alth.wmp". This is a problem with WP, not with the keyboard driver.
-
- Q: Is it possible to change the codes for some of the letters?
-
- A: Yes, you can alter 880.ASM, and MASM/LINK/EXE2BIN it to obtain 880.CP, and
- then repeat the installation. The resulting code page will not be compatible
- with 880, so it should be given a different number. This is not a good idea.
-
- 8. Technical remark
-
- Here is some C code that uses code pages:
-
- #include <stdio.h>
- #include <dos.h>
-
- main()
- {
- union REGS regs;
- union SREGS sregs;
- unsigned filhandl=0; /* or open /dev/con */
- int *foo; /* model= compact! A 32-bit pointer */
- short hwcpcount,prepcpcount;
- int i;
-
- regs.x.bx=filhandl; /* handle for STDIN, hopefully=/dev/con */
- regs.h.ah=0x44; /* IOCTL */
- regs.h.al=0x00; /* get info */
- intdosx(®s,®s,&sregs);
- if (!(regs.x.dx&0x4000)) /* code page supported bit */
- {
- printf("Code page not supported by STDIN");
- return;
- }
- regs.h.ah=0x44; /* IOCTL */
- regs.h.al=0x0c; /* generic character IOCTL */
- regs.x.bx=filhandl;
- regs.h.ch=0x03; /*console?*/
- regs.h.cl=0x6b; /* query prepared code pages */
- intdosx(®s,®s,&sregs);
- foo=(int *)((sregs.ds<<16)+regs.x.dx);
- foo++; /* # bytes returned */
- hwcpcount=*foo++;
- printf("%d hardware pages: ",hwcpcount);
- for (i=0; i<hwcpcount; i++ )
- printf("%d ",*foo++);
- prepcpcount=*foo++;
- printf("%d prepared pages: ",prepcpcount);
- for (i=0; i<prepcpcount; i++ )
- printf("%d ",*foo++);
- }
-
- 8. Credits, acknowledgements, etc
-
- The contents of this archive are placed in public domain; all copyright is
- waived. You may use it as you please.
-
- If you find this package useful, please let me know at:
-
- DLV@CUNYVMS1.BITNET
-
- or:
-
- Dimitri Vulis
- Department of Mathematics
- CUNY Graduate Center
- New York, NY 10036-8099
- U.S. of A.
-
- (Note: never use my old "529 W 111th" address listed in some directories!)
-
- I may then notify you of updates (this is more likely if you provide a e-mail
- address reachable from BITNET).
-
- Feel free to comment on the letter shapes. I always appreciate constructive
- criticism.
-
- I would like to thank Burton Randol, Giorgio Mantzivis, Johann van
- Wingen, Donald Parsons and my father L.N.Klyukvin for their help with
- this project.
-
- You may try contacting your DOS OEM and asking them to include the Cyrillic and
- Greek code pages in their standard DOS distribution.